Search results for "004 Data processing"
showing 10 items of 11 documents
NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction
2013
Computing Euclidean Steiner trees over segments
2020
In the classical Euclidean Steiner minimum tree (SMT) problem, we are given a set of points in the Euclidean plane and we are supposed to find the minimum length tree that connects all these points, allowing the addition of arbitrary additional points. We investigate the variant of the problem where the input is a set of line segments. We allow these segments to have length 0, i.e., they are points and hence we generalize the classical problem. Furthermore, they are allowed to intersect such that we can model polygonal input. As in the GeoSteiner approach of Juhl et al. (Math Program Comput 10(2):487–532, 2018) for the classical case, we use a two-phase approach where we construct a superse…
Rule Extraction From Binary Neural Networks With Convolutional Rules for Model Validation.
2020
Classification approaches that allow to extract logical rules such as decision trees are often considered to be more interpretable than neural networks. Also, logical rules are comparatively easy to verify with any possible input. This is an important part in systems that aim to ensure correct operation of a given model. However, for high-dimensional input data such as images, the individual symbols, i.e. pixels, are not easily interpretable. Therefore, rule-based approaches are not typically used for this kind of high-dimensional data. We introduce the concept of first-order convolutional rules, which are logical rules that can be extracted using a convolutional neural network (CNN), and w…
HECTOR : a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data
2014
Background Current-generation sequencing technologies are able to produce low-cost, high-throughput reads. However, the produced reads are imperfect and may contain various sequencing errors. Although many error correction methods have been developed in recent years, none explicitly targets homopolymer-length errors in the 454 sequencing reads. Results We present HECTOR, a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. In this algorithm, for the first time we have investigated a novel homopolymer spectrum based approach to handle homopolymer insertions or deletions, which are the dominant sequencing errors in 454 pyrosequencing reads. We have evaluat…
Studying the evolution of neural activation patterns during training of feed-forward ReLU networks
2021
The ability of deep neural networks to form powerful emergent representations of complex statistical patterns in data is as remarkable as imperfectly understood. For deep ReLU networks, these are encoded in the mixed discrete–continuous structure of linear weight matrices and non-linear binary activations. Our article develops a new technique for instrumenting such networks to efficiently record activation statistics, such as information content (entropy) and similarity of patterns, in real-world training runs. We then study the evolution of activation patterns during training for networks of different architecture using different training and initialization strategies. As a result, we see …
Parallel and scalable short-read alignment on multi-core clusters using UPC++
2016
[Abstract]: The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the …
Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability
2016
CheS-Mapper - Chemical Space Mapping and Visualization in 3D
2012
Abstract Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which f…
CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding
2014
The majority of next-generation sequencing short-reads can be properly aligned by leading aligners at high speed. However, the alignment quality can still be further improved, since usually not all reads can be correctly aligned to large genomes, such as the human genome, even for simulated data. Moreover, even slight improvements in this area are important but challenging, and usually require significantly more computational endeavor. In this paper, we present CUSHAW3, an open-source parallelized, sensitive and accurate short-read aligner for both base-space and color-space sequences. In this aligner, we have investigated a hybrid seeding approach to improve alignment quality, which incorp…
Extending the semiotics of embodied interaction to blended spaces.
2015
In this paper, we develop a new way of understanding interactions in blended spaces. We do this by developing ideas about embodied semiotics and then apply these ideas to the analysis of interaction in mixed-reality blended spaces (where the physical world and digital world are blended deliberately to provide new forms of interaction). We discuss how blended spaces provide a new medium within which people have experiences. The semiotic analysis reveals how blended spaces are constructed across the physical and the digital, highlighting the ontology, topology, volatility, and agency present within them. It shows how people move between the physical and digital spaces through the objects and …